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Abstract 

How do humans respond to indirect social influence when making decisions? We analysed 
an experiment where subjects had to repeatedly guess the correct answer to factual questions, 
while having only aggregated information about the answers of others. While the response 
of humans to aggregated information is a widely observed phenomenon, it has not been in- 
vestigated quantitatively, in a controlled setting. We found that the adjustment of individual 
guesses depends linearly on the distance to the mean of all guesses. This is a remarkable, 
and yet surprisingly simple, statistical regularity. It holds across all questions analysed, even 
though the correct answers differ in several orders of magnitude. Our finding supports the 
assumption that individual diversity does not affect the response to indirect social influence. 
It also complements previous results on the nonlinear response in information-rich scenarios. 
We argue that the nature of the response to social influence crucially changes with the level 
of information aggregation. This insight contributes to the empirical foundation of models 
for collective decisions under social influence. 



To what extent are the opinions we hold about subjective matters the result of our own consider- 
ations or a reflection of the opinions of others? Even though we would like to believe the former, 
in most real-Ufe situations individual opinions are highly interdependent. They are, directly or 
indirectly, influenced by cultural norms, mass media and interactions in social networks. The 
combined effects of these influences is known as social influence - individuals acting in accor- 
dance to the beliefs and expectations of others (Kahan 1997). Social influence can be categorised 
as direct or indirect. The former is the result of one individual directly affecting the opinion of an- 
other, typically through coercion or persuasion. The latter is a more subtle psychological process 
and takes place when one's opinion and behaviour is influenced by the availability of information 
about others' actions. Our main focus in this paper is on the second form, therefore we regard 
social influence as implicitly indirect. 

Social influence can be readily observed in common collective decision processes, e.g. political 



polls (Mutz 1992), panic stampedes (Helbing et al. 2000), stock markets (Hirshleifer and Teoh 



2003), cultural markets (Salganik et al. 2006), or aid campaigns (Schweitzer and Mach 2008). 



Some of these collective decisions can trap a population in a suboptimal state, for example a 
financial bubble due to financial actors' herding behaviour (Prechter, 2001). Alternatively, they 



may steer a system into positive directions, such as increased tax compliance rates (Wenzel 



2005). However, understanding how such collective decisions are formed, evaluating their benefit 



for the population, and even directing their outcomes, is conditional on quantifying how people 
perceive and respond to social influence. 
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Theoretical work in this field requires to specify a social structure together with mechanisms 



by which influence exerted by that social structure is internalised by the individuals (Castellano 



et al. 2009). Typically, it is considered that individuals form opinions in an interaction network 



(defined in terms of their social acquaintances) in which they are subject to complex inter- 
personal influences. 

As early as 1956, French postulated a theory of social power, in which social structure is repre- 



sented as an explicit interaction network (French 1956). An individual adopts an opinion that 



equals the mean of his own opinion and those he interacts with. Assuming that knowledge about 
the opinion of others is available, the theory predicts that well-connected populations invariably 
reach consensus. 

Later, social psychologists and mathematicians have extended and built upon French's social 



power theory. Prominent works account for weighted averaging of others' opinions (Friedkin 



1986), probability distribution of opinions (DeGroot 1974), and importance of positioning in 



the interaction network (Friedkin and Johnsen 1997). In particular, Latane made a notable 



quantitative contribution with his social impact theory (Latane 1981), which showed via empir- 



ical evidence that the fraction of individuals conforming to a group opinion is a power function 
of the group size (with exponent less than 1). Recent research has also shown how the identi- 



fication of an individual with a group affects the final distribution of opinions (Groeber et al. 



2009). In most models based on interaction networks, it is usually found that individuals respond 



in a highly non-linear manner, e.g. opinion fragmentation, due to the complexities involved in 



inter-personal influences (Hegselmann and Krause, 2002). 



In this paper, we contribute to these theoretical investigations by analysing a decision-making 
experiment based on aggregate information instead of on explicit interaction networks. Our ap- 
proach assumes that in some decision-making scenarios it is not always possible to have full 
information about others' opinions. Instead, only some sort of aggregated representation of all 
opinions is available, which arguably provides less information. For example, individual com- 
pliance to social norms has been shown to depend on knowing the average compliance rate in 



the population (Groeber and Rauhut, 2010). Other examples include book purchases being influ- 



enced by best-sellers lists that are typically compiled from average book store sales ( Bikhchandani 



et al. , 1998 ), or recommender systems offering buyers products whose quality has been estimated 



as the average of all ratings (Hu et al. 2009). We are, therefore, interested in evaluating whether 



individuals react differently when subjected to limited information compared to the non-linear 
response with full information. 

Quantification of human responses to aggregated information is scarce. We present empirical 
evidence of how individuals react to it in a controlled environment. The empirical study we 
analyse was conducted by Lorenz et al. (2011a). In this experiment individuals were asked to 



guess the correct answer to six quantitative questions with an objective answer (such as "What 
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is the border length between Switzerland and Italy?") repeatedly over five experimental rounds 
(see Table [T]). Subjects were assigned to three different treatments in which they had (i) no 
information about others' guesses during all rounds, (ii) the mean of all guesses in the previous 
round or (iii) full information about others' estimates. Here, we focus on (ii), and report a 
statistically significant linear dependence between the change in one's estimate and the distance 
of the previous estimate from the mean. 



Results 

We analyse the following set-up: a set of A'^ subjects were asked six quantitative questions with 
a clearly defined objective truth. Individuals did not know a priori the true answers, and thus 
could only provide a guess. Each question was repeated for five consecutive rounds. At the end 
of each round, the subjects were presented with either some or no information about others' 
guesses, after which they could revise their own estimate. Let Xi{t) be the guess of individual 
i G [1, A^] at round t G [1)5] for a particular question. The arithmetic average of all A^ individuals 
at time t is then denoted as x{t). In the aggregate regime subjects are presented with x{t) at 
the end of round t before making their next guess Xi{t + 1). We study how the change in one's 
opinion, Axi(i) = Xi{t) — Xi{t — 1)^ is related to its the distance from the mean in the previous 



time step x{t — l) —Xi{t— 1). From the experimental data (Lorenz et al. , 2011b), we can calculate 
Axi{t) and x{t — 1) — Xi{t — 1) across all rounds, subjects, questions and sessions. 

At the finest granularity of the data, there are A^ = 12 subjects answering a given question for a 
given information condition over five rounds. In total, one would have 12 x 4 = 48 data points. 
Considering, however, that each question was asked four times at a given information condition 
(see Table [T]), we pool these responses together to produce 48 x 4 = 192 samples per information 
condition and per question. In Figure [T] we have plotted typical Axj(t) vs. x{t — 1) — Xi{t — 1) 
for two questions. The left column shows that in the no information regime there is no particular 
dependency between the distance to the average and the ensuing adjustment of one's guess. In 
contrast, there is a positive linear relation in the aggregate information regime. 

We formalise this qualitative argument by the following linear regression model. 

Axi{t) =Po + f3i{x{t - 1) - Xi{t - 1)) + ei{t), (1) 

with the associated null hypothesis T-Lq : /3i = 0, and two-sided alternative Tii : /3i 7^ 0. 

Due to the experimental set-up, in particular the nature of the questions, subjects did not have a 
solid idea about the true answers. However, the questions were not too hard to prevent educated 



guesses about the approximate order of magnitude. Lorenz et al. (2011a) note that the initial 



opinion distribution for each question is right-skewed - a majority of estimates are low and a 
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minority fall on a fat right tail. Nevertheless, in Methods, we justify using Eq. [T] to model the 
aggregate information regime. 

It is important to mention that, in principle, regression models, such as ours, cannot make ex- 
plicit claims regarding cause and effect. Rather, the primary goal is to mathematically derive 
one variable from the other with as high fidelity as possible. We posit that in the empirical case 
considered here, one is able to infer the main causality direction, because the study was designed 
with the main purpose of evaluating how social influence affects one's decisions. Therefore, sub- 
jects were exposed to social information prior to their decision making. We, therefore, argue that 
in the aggregate regime, one of the main causes for an opinion change is knowledge of the mean 
(other causes being unobservable factors, such as conviction in own opinion, beliefs about others' 
expertise, etc.). 

Table |4] shows all results of estimating the linear model. We focus primarily on the estimation 
of /?!, as the constant term, /3o, is heavily influenced by a few outliers, and thus exhibits large 
standard errors even when significant. From the reported p- values, we see that the impact of the 
distance to the mean opinion, x{t — 1) — Xi(t — 1), is highly significant across all questions (with 
low rob. std. errors) in explaining one's own opinion change. Furthermore, the size of the effect 
shows that knowledge of the mean accounts for a considerable part of the opinion change. 

Discussion 

Our main goal in this paper was to quantify how people respond to social influence when making 
decisions. In particular, we focused on a limited-information scenario, in which individuals pos- 
sessed the mean of all opinions. This form of indirect social influence is prevalent in a wide range 
of collective decisions, e.g. norm compliance, product recommendations and purchases. Quanti- 
fying individual human behaviour in such contexts contributes to understanding such collective 
decisions. 

We used a unique dataset from an experiment in which subjects had to guess the answer to 
quantitative questions repeatedly, while knowing the mean of all guesses. We studied how the 
change in individual guesses relates to their distance from the mean. Our analysis shows that 
a linear model is sufficient to explain this relationship for all experimental questions, with a 
significant and considerable impact. Furthermore, this finding holds for questions with correct 
answers that differ by about 10 orders of magnitude. Therefore, we emphasize that the result is 
not a first-order approximation of a non- linear regime around a narrow range of x — Xi. 

Our quantitative insights represent a striking statistical regularity. Despite individual differences 
in subjects, e.g. emotions, conviction in one's own opinion, beliefs about the competency of oth- 
ers, and tendency to conform to the group opinion, the same mathematical relationship underlies 
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the individual reactions to social influence. This suggests that once initial guesses are formed, di- 
versity among subjects does not play a role in the adjustment of subsequent estimates. Moreover, 
we argue that the hnear nature of the response is due to the level of information aggregation 
in the experiment. We believe that the availability of more fine-grained information, such as 
allowing group interactions or providing the opinion distribution, would recover the complex 
non-linear response found in most models of social influence. 

Our finding also contributes to the design of agent-based models for collective decisions. Such 
models play an important role in testing individual-level interaction mechanisms that lead a 
population to favourable collective decisions. While most prominent models rely on ad-hoc as- 
sumptions about individual behaviour (e.g. linear voter model, Schelling's segregation model), 
with the increasing availability of experimental data, there is a growing interest in basing these 
assumptions on empirical regularities. The rule we revealed can, therefore, be used to further 
model, quantify and design collective decisions under aggregated information. 



Methods 

The model is estimated by the method of Ordinary Least Squares (OLS), which is based to 
the following assumptions: (a) E{ei\xi) = (linear model is correct), (b) ~ AA(0, u^) (nor- 
mality of the error distribution), (c) Var(ei|j;i) = a"^ (homoscedasticity) , and (d) E{ei,ej) = 
(independence of errors). First, to assess the overall feasibility of the linear model, we plot the 
residuals from the OLS estimation of Eq. [TJversus the fitted values, commonly known as a Tukey- 
Anscombe plot (Figure |2]). A strong trend in the plot is evidence that the linear model is not 
suitable, consequently (a) is violated. 

For the no-information case, arguably, it is not reasonable to expect Eq. [T]to be valid as subjects 
did not have access to any information. Thus, any causal relation between Axj(t) and x{t — 1) — 
Xi{t — 1) can be ruled out a priori. 

As seen in Figure [2] the residuals in the no information regime do not fluctuate randomly around 
the fitted values - a strong evidence against assumption (a). On the other hand, comparing with 
the aggregate information case, the Tukey-Anscombe plots do not exhibit a visible dependence 
between residuals and model fit, thus support assumption (a). 

To actually quantify the presence of a trend in Figure [2] we compute the mutual information 
(MI) between the fitted values and their residuals. The concept of mutual information origi- 
nates in information theory, and, intuitively speaking, measures the amount of information that 
two variables share, i.e. how much knowing one of these variables reduces uncertainty about 



the other (Cover and Thomas 2006 Chap. 2). Formally, the mutual information, I{X,Y), be- 



tween variables X and Y, equals H{X) -\- H(Y) — H{X,Y), where H{X) is the information 
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(entropy) in X, and H{X, Y) is the joint entropy of X and Y. If X and Y are indepen- 
dent then H{X,Y) = H{X) + H(Y), and thus the mutual information, I{X,Y), equals 0. 
We also make use of the inequality I{X,Y) < mm{H{X), H{Y)} to derive the normalisation 
/norm(-^5^) = I /^^^{H (X) , H (Y)} . In this way our MI estimate has an upper bound of 
1, which is attained only if X and Y are identical. 

The advantage of computing MI is that it is not only sensitive to linear correlations, but also to 



non-linearities that are not captured in the covariance (Kraskov et al. 2004). The MI estima- 
tions for all questions are shown above each plot in Figure [2} Unsurprisingly, there is stronger 
dependency between residuals and fitted values in the no-information regime, especially where 
a trend is clearly visible. In contrast, all questions in the aggregate regime show very low values 
of MI. 

Second, in Figure[3]we check normality of errors by plotting the quantiles of the residual distribu- 
tion against the quantiles of a normal distribution. The off-diagonal points in all questions clearly 
indicate the presence of a few large outliers, as expected for skewed data. Non-normality of resid- 
uals plays no role for the BLUE (best linear unbiased estimator) properties of OLS estimators, 
provided (a) and (c) hold (the homoscedasticity assumption is evaluated below). However, exact 
t and F statistics will be incorrect. Therefore, we make use of the relatively large sample size in 



all questions to justify the asymptotic normality property of the OLS estimators (Baltagi 2011 



Chap. 5). It can be shown that by employing the central limit theorem and conditional on (a) 



and (c), OLS produces estimators that are approximately normal (Wooldridge, 2005 Chap. 5), 
hence f-test can be carried out in the same way. 

Next, we verify the homoscedasticity assumption, (c), of €i{t). To this end, we run the Koenker 
studentised version of the Breusch-Pagan test (Koenker 1981[ ). This test regresses the squared 



residuals on the predictor in Eq. [T]and uses the more widely applied Lagrange Multiplier (LM) 
statistics instead of the F-statistics. Although more sophisticated procedures, e.g. White's test, 
would account for a non-linear relation between the residuals and the predictor, we find that 
the Breusch-Pagan test is sufficient to detect heteroscedasticity in the data. Table |2] shows that 
the null hypothesis of homoscedastic error can be rejected with high significance for Questions 
1, 2, 4, and 5. The consequence for the OLS method is that the estimated variance of (3i will be 
biased, hence the statistics used to test hypotheses will be invalid. Furthermore, none of the OLS 
estimators will be asymptotically normal. Thus, to account for the presence of heteroscedasticity, 
we use robust standard errors. 

Finally, the serial correlation in (d) is tested by assuming the following AR(1) process for the 
error term 

iiit) = ao + aiiiit - 1) + Ut) (2) 

with ii being the residuals from estimating Eq. [ijand ^i{t) ~ AA(0, z^). One-period lag is sufficient 
to model error correlation, given that subjects answered the same question over just 5 rounds. 
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In addition, by excluding the first guess when no information was available, we have effectively 
4 periods. The OLS estimation of Eq. [2] in Table [3] indicates that ai either is not significantly 
different from (Questions 3, 5 and 6) or has a small effect when significant (Questions 1 and 
4). Consequently, inferences based on t-tests and F-tests can be carried out. 

All data analysis was done with R ( |http://www.r-project .org/j version 2.15.0). Quantile plots 
of the residuals were generated with rqq (package lawstat, version 2.3). Breusch-Pagan het- 
eroscedasticity test was implemented by bptest (package Imstat, version 0.9-29). Finally to 
estimate Eq. [T| we used the standard Im function with robust standard errors calculated by 
hccm (package car, version 2.0-12). Mutual information was computed with mult i information 
(package infotheo, version 1.1.0) 
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QI 


Q2 


Q3 


Q4 


Q5 


Q6 


no info 


S1,S4, 


S2,S5, 


S3,S6, 


S1,S4, 


S2,S5, 


S3,S6, 


S7,S10 


S8,S11 


S9,S12 


S7,S10 


S8,S11 


S9,S12 


agregate info 


S3,S6, 


S1,S4, 


S2,S5, 


S3,S6, 


S1,S4, 


S2,S5, 


S9,S12 


S7,S10 


S8,S11 


S9,S12 


S7,S10 


S8,S11 


full info 


S2,S5, 


S3,S6, 


S1,S4, 


S2,S5, 


S3,S6, 


S1,S4, 


S8,S11 


S9,S12 


S7,S10 


S8,S11 


S9,S12 


S7,S10 



Table 1: Experimental Setup. The experiment consisted of 12 sessions (S) each composed of 
12 subjects. In each session, the 12 subjects had to answer two questions (Q) in the 
no information, two in the aggregate and two in the control condition (see main text), 
for a total of six questions. The order of the questions was randomised across sessions. 
After each of the five rounds subjects were asked the same question again and could 
revise their answers depending on the information available to them. In the table, 
columns indicate question number and rows - information regime. Each cell lists the 
sessions when a given question was asked for a particular information regime. 





QI 


Q2 


Q3 


Q4 


Q5 


Q6 


LM statistic 


0.46 


11.84 


0.0037 


4.607 


7.5679 


1.1711 


p- value 


0.5 


0.0005 


0.95 


0.03 


0.005 


0.28 


samples 


192 


192 


192 


192 


188 


188 



Table 2: Breusch-Pagan test for heteroscedasticity. Each column corresponds to one of 

the six questions. Since the linear model has only one regressor the Koenker version of 
the test has one degree of freedom for all questions. 
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Estimate 


Robust std. errors 


t- value 


p- value 


N 


df 


Qi 




-3.6 


14.02 


-0.26 


0.79 










2.47 




191 


189 


ai 


0.3 


0.12 


0.01 






Q2 




0.46 


12.61 


0.04 


0.97 






-0.19 


0.1 




0.05 


191 


189 


ai 


-2 






Q3 


"0 


7.2 


836 


0.009 


0.9 














191 


189 


ai 


0.03 


0.07 


0.47 


0.64 


Q4 


ao 


-1.88 


22.36 


-0.08 


0.93 














191 


189 


ai 


0.05 


0.16 


2.14 


0.03 






Q5 


"0 


-0.32 


14.9 


-0.02 


0.9 


187 






185 




-0.07 


0.05 


-1.43 


0.15 






Q6 


"0 


-3.6 


1388 


-0.003 


0.99 


187 






0.07 






185 


ai 


-0.01 


-0.19 


0.85 







Table 3: First-order serial correlation of residuals. 







Estimate 


Std. 
Errors 


Robust 

std. 
errors 


t-value 


p- value 


samples 


df 


Qi 




-176.46 


14.98 


15.55 


-11.35 


< 2.2 X 10^16 


192 


190 


/3i 


0.97 


0.02 


0.1 


9.57 


< 2.2 X 10-16 


Q2 


/3o 


35.33 


12.6 


12.9 


2.74 


0.007 


192 


190 




0.27 


0.05 


0.09 


2.89 


0.004 


Q3 


/3o 


-1321.5 


828.2 


853 


-1.55 


0.12 


192 


190 


/3i 


0.83 


0.05 


0.1 


6.25 


2.7 X 10-9 


Q4 


/3o 


-146.3 


23.2 


23.7 


-6.2 


3.8 X 10-9 


192 


190 


/3i 


0.6 


0.01 


0.03 


18.8 


< 2.2 X 10-16 


Q5 


/3o 


6.8 


14.8 


15.1 


0.5 


0.66 


188 


186 




0.4 


0.04 


0.1 


3.72 


0.0003 


Q6 




-821 


103 


1387 


-0.6 


0.55 


188 


186 


/3i 


0.46 


0.02 


0.03 


15.3 


< 2 X 10-16 



Table 4: Robust linear regression of Eq. [l| Uncorrected standard errors are reported for 
comparison only. Last column shows degrees of freedom. 
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No Info 



Aggregate Info 



x10 Question 1 



x10* Questions 



Ax,(l) - 



-1.5 
-2.0 
-2.5 



1.0 



-0.5 

-1.0 - 



-1.5 -1.0 -0 5 0.0 

x10^ 




0.0 0.5 1.0 
- <i XIO' 



0.1 0.0 0.1 0.2 0.3 0.4 



X10'' Question 1 



xlO Questions 











10 













-2 








5 


-4 

Ax,(t) 











-6 
-8 
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Figure 1: Scatter plots for questions 1 (first and third column) and 3 (second and 
fourth column). The green lines show median smoothing: the x-axis has been split 
into equaUy sized bins of size 10 (arbitrary), and the medians in each bin are plotted. 
The bottom row shows median smoothing with shaded areas corresponsing to error 
bars between the first and third quartile of each bin. Note the scaling of the x- and 
y-axis. 
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Figure 2: Residuals vs. fitted values for both information conditions and all ques- 
tions. The first two rows show the no-information condition, while the last two - the 
aggregate information condition. Questions are numbered from left to right and top 
to bottom. The mutual information (MI) is shown on top of each plot (see Methods 
for definition of MI) . 
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Figure 3: QQ Plots. Theoretical quantiles of a normal distribution versus sample quantiles 

for all six questions. There are outliers in the data resulting in non-normal residuals. 
Question numbers (Q) are indicated on the top left corner of each plot. 
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